library(topGO)
library(knitr) # to use kable()
library(limma) # to use vennDiagram()
library(ggplot2)
TAG <- params$TAG
VAR <- params$VAR
ANNOTATION <- list(genes = '../2019-07-26/genes/annotation.txt',
isoforms = '../2019-07-26/transcripts/annotation.txt')
TAGDIR <- paste0('../2020-01-08/', TAG, '/')
This is the enrichment analysis for genes regulated by regime. Because I quantified the association of expression with regime in two different ways, this document has two sections. Section Variance reports the enrichment analysis performed with genes ordered by the proportion of expression variance explained by regime. Note that some genes with low variance may still be highly associated with regime, even if the fold change between levels of this factor is low. Section Differential expression uses an ordering of genes based on the significance of the differential expression between levels of regime, which does depend on fold change.
Functional annotation is in 2019-07-26. I will also upload two lists of genes, with either proportion of variance explained by regime or p-value of differential expression test.
tagVariance <- read.table(paste0(TAGDIR, VAR, '_variance.txt'))
tagPValue <- read.table(paste0(TAGDIR, VAR, '_pvalue.txt'))
annotation <- read.table(ANNOTATION[[TAG]], col.names = c('tagname', 'goterms'))
To initialize the topGOdata object, I will need the gene list as a named numeric vector, where the names are the genes identifiers and the numeric values, either the portion of variance explained by regime or the p-values of the differential expression test. The structure() function adds attributes to an object.
Variance <- structure(tagVariance[,1], names = row.names(tagVariance))
PValues <- structure(tagPValue[,1], names = row.names(tagPValue))
rm(tagVariance, tagPValue)
There are 18598 genes scored with a variance portion and a differential expression p-value. It should actually be the exact same genes. The annotation data frame has more than one GO term for every tag, separated by |. I need a named list.
head(annotation)
## tagname goterms
## 1 XLOC_000002 GO:0008417|GO:0016020|GO:0006486
## 2 XLOC_000009 GO:0043130|GO:0005515|GO:0043161
## 3 XLOC_000010 GO:0005840|GO:0015935|GO:0003735|GO:0006412
## 4 XLOC_000015 GO:0006355|GO:0003700|GO:0006357|GO:0043565|GO:0005634|GO:0032502
## 5 XLOC_000021 GO:0016567|GO:0006397|GO:0061630|GO:0008270
## 6 XLOC_000036 GO:0005272|GO:0006814|GO:0016020
allgenes2GO <- strsplit(as.character(annotation$goterms), "|", fixed = TRUE)
names(allgenes2GO) <- annotation$tagname
rm(annotation)
There are 9691 genes with GO annotations. But the differential expression analysis includes many more genes. In order to include the not-annotated genes in the enrichment analysis, to see if annotated or not annotated genes are more or less often differentially expressed, I should attribute a GO term to them. According to [http://geneontology.org/docs/faq/] nowadays we express lack of annotation by annotating to the root nodes, i.e. GO:0008150 biological_process, GO:0003674 molecular_function, and GO:0005575 cellular_component.
for (tag in unique(c(names(PValues), names(Variance)))) {
if (! tag %in% names(allgenes2GO)) {
allgenes2GO <- append(allgenes2GO,
structure(list(c("GO:0008150", "GO:0003674", "GO:0005575")), names = tag))
}
}
Creation of a topGO dataset is documented in section 4 of topGO’s the user manual: https://bioconductor.org/packages/release/bioc/vignettes/topGO/inst/doc/topGO.pdf. I need to use the new command, and fill up the slots. The annot object must be a function that compiles “a list of GO terms such that each element in the list is a character vector containing all the gene identifiers that are mapped to the respective GO term.” There are several options, that you can check using help(annFUN.gene2GO), for example. The annFUN.gene2GO requires a gene2GO argument, which is the list of gene-to-GO terms I made before. The geneSelectionFun object is a function that selects the interesting (most significant) genes. It is required to perform Fisher’s exact test. The nodeSize is used to prune the GO hierarchy from the terms which have less than n annotated genes.
I generate three datasets, to analyse each of the three ontologies.
selection <- function(allScores) {return(allScores < 0.05)}
GOdata.BP <- new('topGOdata',
description = 'Differential expression among Brachionus plicatilis populations.',
ontology = 'BP',
allGenes = PValues,
annot = annFUN.gene2GO,
gene2GO = allgenes2GO,
geneSelectionFun = selection,
nodeSize = 5)
GOdata.MF <- new('topGOdata',
description = 'Differential expression among Brachionus plicatilis populations.',
ontology = 'MF',
allGenes = PValues,
annot = annFUN.gene2GO,
gene2GO = allgenes2GO,
geneSelectionFun = selection,
nodeSize = 5)
GOdata.CC <- new('topGOdata',
description = 'Differential expression among Brachionus plicatilis populations.',
ontology = 'CC',
allGenes = PValues,
annot = annFUN.gene2GO,
gene2GO = allgenes2GO,
geneSelectionFun = selection,
nodeSize = 5)
DataSummary <- data.frame(ontology = c('BP', 'MF', 'CC'),
Num_Genes = sapply(list(GOdata.BP, GOdata.MF, GOdata.CC), numGenes),
Num_GO_terms = sapply(list(GOdata.BP, GOdata.MF, GOdata.CC), function(x) length(usedGO(x))))
kable(DataSummary, caption='Number of feasible genes or transcripts and number of GO terms used in each data set.')
rm(DataSummary)
| ontology | Num_Genes | Num_GO_terms |
|---|---|---|
| BP | 15162 | 1154 |
| MF | 17429 | 576 |
| CC | 12940 | 291 |
There are more than one way to test for enrichment. Something that took me a while to understand is that not only there are different test statistics (Fisher’s exact test, Kolmogorov-Smirnov, and others) but also different algorithms: classic, elim, weight… The algorithms are ways to deal with the dependence structure among GO terms due to topology. Some algorithms are compatible with all statistics implemented in topGO. But the weight and the parentchild algorithms are only applicable to Fisher’s exact test. I am not interested in the classic algorithm, which treats GO nodes as independent, and therefore produces an excess of significant results. I will not use the Fisher’s exact test, because it dependes on an arbitrary threshold of significance on non-adjusted p-values.
BP.elim <- runTest(GOdata.BP, algorithm = "elim", statistic = "ks")
BP.weight01 <- runTest(GOdata.BP, algorithm = "weight01", statistic = "ks")
BP.lea <- runTest(GOdata.BP, algorithm = "lea", statistic = "ks")
MF.elim <- runTest(GOdata.MF, algorithm = "elim", statistic = "ks")
MF.weight01 <- runTest(GOdata.MF, algorithm = "weight01", statistic = "ks")
MF.lea <- runTest(GOdata.MF, algorithm = "lea", statistic = "ks")
CC.elim <- runTest(GOdata.CC, algorithm = "elim", statistic = "ks")
CC.weight01 <- runTest(GOdata.CC, algorithm = "weight01", statistic = "ks")
CC.lea <- runTest(GOdata.CC, algorithm = "lea", statistic = "ks")
ResultsSummary <- data.frame(ontology = rep(c("BP", "MF", "CC"), each = 3),
algorithm = rep(c("elim", "weight01", "lea"), 3),
TermsTested = sapply(list(BP.elim, BP.weight01, BP.lea, MF.elim, MF.weight01, MF.lea, CC.elim, CC.weight01, CC.lea), function(X) length(score(X))),
Significant = sapply(list(BP.elim, BP.weight01, BP.lea, MF.elim, MF.weight01, MF.lea, CC.elim, CC.weight01, CC.lea), function(X) sum(score(X) < 0.01)))
kable(ResultsSummary, caption="Number of non-trivial terms tested and those with a score (not corrected p-value) lower than 0.01.")
| ontology | algorithm | TermsTested | Significant |
|---|---|---|---|
| BP | elim | 1154 | 34 |
| BP | weight01 | 1154 | 23 |
| BP | lea | 1154 | 70 |
| MF | elim | 576 | 37 |
| MF | weight01 | 576 | 31 |
| MF | lea | 576 | 75 |
| CC | elim | 291 | 19 |
| CC | weight01 | 291 | 11 |
| CC | lea | 291 | 37 |
rm(ResultsSummary)
The topGO package does not pay much attention to what terms are significant because they are overrepresented and which ones are underrepresented among the most differentially expressed genes. I think it’s worth separating them, to facilitate the biological interpretation. Note that not all terms listed in the tables below are significant. The scores for the three methods (elim, weight01 and lea) are non-corrected p-values.
orderedTerms <- names(sort(score(BP.weight01)))
significant.weight01 <- score(BP.weight01)[orderedTerms] <= 0.01
significant.lea <- score(BP.lea)[orderedTerms] <= 0.01
significant.elim <- score(BP.elim)[orderedTerms] <= 0.01
sigTerms <- orderedTerms[significant.weight01 & significant.lea & significant.elim]
# sigTerms gets updated, and is already used in the code, but I need a permanent copy for later.
BP.pvalue.sigTerms <- sigTerms
BP.all <- GenTable(GOdata.BP, elim=BP.elim, weight01=BP.weight01, lea=BP.lea,
orderBy="weight01", ranksOf="elim", topNodes=sum(significant.elim))
kable(
BP.all[BP.all$Significant > BP.all$Expected,],
caption = "Most over-represented terms of the Biological Process ontology.")
| GO.ID | Term | Annotated | Significant | Expected | Rank in elim | elim | weight01 | lea | |
|---|---|---|---|---|---|---|---|---|---|
| 1 | GO:0055085 | transmembrane transport | 500 | 260 | 185.43 | 1 | 6.9e-13 | 2.5e-12 | 7.8e-16 |
| 2 | GO:0006508 | proteolysis | 442 | 217 | 163.92 | 2 | 1.2e-09 | 4.6e-08 | 4.7e-11 |
| 3 | GO:0006468 | protein phosphorylation | 276 | 132 | 102.36 | 3 | 4.3e-06 | 3.7e-06 | 4.3e-06 |
| 4 | GO:0055114 | oxidation-reduction process | 376 | 175 | 139.44 | 4 | 3.6e-05 | 2.2e-05 | 3.6e-05 |
| 5 | GO:0006289 | nucleotide-excision repair | 13 | 11 | 4.82 | 6 | 0.00050 | 0.0005 | 0.00050 |
| 6 | GO:0007160 | cell-matrix adhesion | 16 | 11 | 5.93 | 11 | 0.00118 | 0.0012 | 0.00118 |
| 7 | GO:0007186 | G protein-coupled receptor signaling pat… | 206 | 94 | 76.40 | 12 | 0.00122 | 0.0014 | 0.00122 |
| 8 | GO:0007165 | signal transduction | 544 | 229 | 201.75 | 31 | 0.00849 | 0.0016 | 6.3e-05 |
| 9 | GO:0006813 | potassium ion transport | 56 | 29 | 20.77 | 7 | 0.00057 | 0.0017 | 0.00057 |
| 10 | GO:0006355 | regulation of transcription, DNA-templat… | 355 | 158 | 131.66 | 5 | 0.00018 | 0.0023 | 0.00018 |
| 11 | GO:0006470 | protein dephosphorylation | 58 | 31 | 21.51 | 8 | 0.00059 | 0.0023 | 0.00059 |
| 12 | GO:0034220 | ion transmembrane transport | 147 | 74 | 54.52 | 10 | 0.00096 | 0.0032 | 0.00096 |
| 13 | GO:0006979 | response to oxidative stress | 26 | 14 | 9.64 | 16 | 0.00343 | 0.0034 | 0.00343 |
| 14 | GO:0005992 | trehalose biosynthetic process | 6 | 5 | 2.23 | 17 | 0.00352 | 0.0035 | 0.00352 |
| 15 | GO:1901642 | nucleoside transmembrane transport | 7 | 6 | 2.60 | 19 | 0.00377 | 0.0038 | 0.00377 |
| 16 | GO:0044271 | cellular nitrogen compound biosynthetic … | 783 | 300 | 290.38 | 906 | 0.74664 | 0.0040 | 0.82339 |
| 17 | GO:0043161 | proteasome-mediated ubiquitin-dependent … | 45 | 23 | 16.69 | 23 | 0.00491 | 0.0049 | 0.00491 |
| 18 | GO:0048870 | cell motility | 14 | 8 | 5.19 | 174 | 0.05915 | 0.0062 | 0.05915 |
| 19 | GO:0003341 | cilium movement | 10 | 8 | 3.71 | 30 | 0.00770 | 0.0077 | 0.00770 |
| 21 | GO:0060271 | cilium assembly | 46 | 28 | 17.06 | 9 | 0.00064 | 0.0088 | 0.00064 |
| 22 | GO:0051726 | regulation of cell cycle | 43 | 19 | 15.95 | 411 | 0.22595 | 0.0090 | 0.36698 |
| 23 | GO:0046942 | carboxylic acid transport | 20 | 14 | 7.42 | 24 | 0.00527 | 0.0093 | 0.00527 |
| 24 | GO:0006367 | transcription initiation from RNA polyme… | 13 | 9 | 4.82 | 35 | 0.01065 | 0.0107 | 0.01065 |
| 25 | GO:0007017 | microtubule-based process | 195 | 96 | 72.32 | 27 | 0.00656 | 0.0108 | 0.00110 |
| 26 | GO:0007156 | homophilic cell adhesion via plasma memb… | 21 | 8 | 7.79 | 40 | 0.01237 | 0.0124 | 0.01237 |
| 27 | GO:0008272 | sulfate transport | 13 | 8 | 4.82 | 42 | 0.01265 | 0.0126 | 0.01265 |
| 28 | GO:0006383 | transcription by RNA polymerase III | 8 | 7 | 2.97 | 45 | 0.01296 | 0.0130 | 0.01296 |
| 29 | GO:0043628 | ncRNA 3’-end processing | 6 | 4 | 2.23 | 48 | 0.01399 | 0.0140 | 0.01399 |
| 30 | GO:0016998 | cell wall macromolecule catabolic proces… | 9 | 6 | 3.34 | 54 | 0.01555 | 0.0156 | 0.01555 |
| 31 | GO:0006032 | chitin catabolic process | 9 | 6 | 3.34 | 55 | 0.01555 | 0.0156 | 0.01555 |
| 32 | GO:0006821 | chloride transport | 8 | 6 | 2.97 | 64 | 0.01609 | 0.0161 | 0.01609 |
| 33 | GO:0006241 | CTP biosynthetic process | 5 | 5 | 1.85 | 65 | 0.01614 | 0.0161 | 0.01614 |
| 34 | GO:0046039 | GTP metabolic process | 5 | 5 | 1.85 | 70 | 0.01614 | 0.0161 | 0.01614 |
kable(
BP.all[BP.all$Significant < BP.all$Expected,],
caption = "Most under-represented terms of the Biological Process ontology.")
| GO.ID | Term | Annotated | Significant | Expected | Rank in elim | elim | weight01 | lea | |
|---|---|---|---|---|---|---|---|---|---|
| 20 | GO:1901566 | organonitrogen compound biosynthetic pro… | 432 | 139 | 160.21 | 690 | 0.50350 | 0.0079 | 0.89898 |
vennDiagram(vennCounts(cbind(weight01=significant.weight01,
lea=significant.lea,
elim=significant.elim)))
kable(data.frame(Term = Term(GOTERM[sigTerms]),
Definition = Definition(GOTERM[sigTerms]),
PValue=score(BP.weight01)[sigTerms]),
caption = paste('Biological process terms significantly associated with',
VAR, 'according to all 3 algorithms', sep=' '))
| Term | Definition | PValue | |
|---|---|---|---|
| GO:0055085 | transmembrane transport | The process in which a solute is transported across a lipid bilayer, from one side of a membrane to the other. | 0.0000000 |
| GO:0006508 | proteolysis | The hydrolysis of proteins into smaller polypeptides and/or amino acids by cleavage of their peptide bonds. | 0.0000000 |
| GO:0006468 | protein phosphorylation | The process of introducing a phosphate group on to a protein. | 0.0000037 |
| GO:0055114 | oxidation-reduction process | A metabolic process that results in the removal or addition of one or more electrons to or from a substance, with or without the concomitant removal or addition of a proton or protons. | 0.0000223 |
| GO:0006289 | nucleotide-excision repair | A DNA repair process in which a small region of the strand surrounding the damage is removed from the DNA helix as an oligonucleotide. The small gap left in the DNA helix is filled in by the sequential action of DNA polymerase and DNA ligase. Nucleotide excision repair recognizes a wide range of substrates, including damage caused by UV irradiation (pyrimidine dimers and 6-4 photoproducts) and chemicals (intrastrand cross-links and bulky adducts). | 0.0004978 |
| GO:0007160 | cell-matrix adhesion | The binding of a cell to the extracellular matrix via adhesion molecules. | 0.0011770 |
| GO:0007186 | G protein-coupled receptor signaling pathway | A series of molecular signals that proceeds with an activated receptor promoting the exchange of GDP for GTP on the alpha-subunit of an associated heterotrimeric G-protein complex. The GTP-bound activated alpha-G-protein then dissociates from the beta- and gamma-subunits to further transmit the signal within the cell. The pathway begins with receptor-ligand interaction, or for basal GPCR signaling the pathway begins with the receptor activating its G protein in the absence of an agonist, and ends with regulation of a downstream cellular process, e.g. transcription. The pathway can start from the plasma membrane, Golgi or nuclear membrane (PMID:24568158 and PMID:16902576). | 0.0013843 |
| GO:0007165 | signal transduction | The cellular process in which a signal is conveyed to trigger a change in the activity or state of a cell. Signal transduction begins with reception of a signal (e.g. a ligand binding to a receptor or receptor activation by a stimulus such as light), or for signal transduction in the absence of ligand, signal-withdrawal or the activity of a constitutively active receptor. Signal transduction ends with regulation of a downstream cellular process, e.g. regulation of transcription or regulation of a metabolic process. Signal transduction covers signaling from receptors located on the surface of the cell and signaling via molecules located within the cell. For signaling between cells, signal transduction is restricted to events at and within the receiving cell. | 0.0015717 |
| GO:0006813 | potassium ion transport | The directed movement of potassium ions (K+) into, out of or within a cell, or between cells, by means of some agent such as a transporter or pore. | 0.0016705 |
| GO:0006355 | regulation of transcription, DNA-templated | Any process that modulates the frequency, rate or extent of cellular DNA-templated transcription. | 0.0023402 |
| GO:0006470 | protein dephosphorylation | The process of removing one or more phosphoric residues from a protein. | 0.0023475 |
| GO:0034220 | ion transmembrane transport | A process in which an ion is transported across a membrane. | 0.0032023 |
| GO:0006979 | response to oxidative stress | Any process that results in a change in state or activity of a cell or an organism (in terms of movement, secretion, enzyme production, gene expression, etc.) as a result of oxidative stress, a state often resulting from exposure to high levels of reactive oxygen species, e.g. superoxide anions, hydrogen peroxide (H2O2), and hydroxyl radicals. | 0.0034284 |
| GO:0005992 | trehalose biosynthetic process | The chemical reactions and pathways resulting in the formation of trehalose, a disaccharide isomeric with sucrose and obtained from certain lichens and fungi. | 0.0035202 |
| GO:1901642 | nucleoside transmembrane transport | NA | 0.0037723 |
| GO:0043161 | proteasome-mediated ubiquitin-dependent protein catabolic process | The chemical reactions and pathways resulting in the breakdown of a protein or peptide by hydrolysis of its peptide bonds, initiated by the covalent attachment of ubiquitin, and mediated by the proteasome. | 0.0049073 |
| GO:0003341 | cilium movement | The directed, self-propelled movement of a cilium. | 0.0077007 |
| GO:0060271 | cilium assembly | The assembly of a cilium, a specialized eukaryotic organelle that consists of a filiform extrusion of the cell surface. Each cilium is bounded by an extrusion of the cytoplasmic membrane, and contains a regular longitudinal array of microtubules, anchored basally in a centriole. | 0.0087982 |
| GO:0046942 | carboxylic acid transport | The directed movement of carboxylic acids into, out of or within a cell, or between cells, by means of some agent such as a transporter or pore. Carboxylic acids are organic acids containing one or more carboxyl (COOH) groups or anions (COO-). | 0.0092880 |
I think the GO graph is useful to see the relationship among the significant terms. But too large graphs are impossible to read. I don’t know how to split the graph in meaningful subgraphs.
showSigOfNodes(GOdata.BP, score(BP.weight01),
firstSigNodes = sum(significant.elim),
wantedNodes = sigTerms)
## $dag
## A graphNEL graph with directed edges
## Number of Nodes = 226
## Number of Edges = 427
##
## $complete.dag
## [1] "A graph with 226 nodes."
This is just a example of the most significant GO term:
showGroupDensity(GOdata.BP, orderedTerms[1], rm.one=FALSE)
showGroupDensity(GOdata.BP, orderedTerms[2], rm.one=FALSE)
showGroupDensity(GOdata.BP, orderedTerms[3], rm.one=FALSE)
orderedTerms <- names(sort(score(MF.weight01)))
significant.weight01 <- score(MF.weight01)[orderedTerms] <= 0.01
significant.lea <- score(MF.lea)[orderedTerms] <= 0.01
significant.elim <- score(MF.elim)[orderedTerms] <= 0.01
sigTerms <- orderedTerms[significant.weight01 & significant.lea & significant.elim]
# sigTerms gets updated, and is already used in the code, but I need a permanent copy for later.
MF.pvalue.sigTerms <- sigTerms
MF.all <- GenTable(GOdata.MF, elim=MF.elim, weight01=MF.weight01, lea=MF.lea,
orderBy="weight01", ranksOf="elim", topNodes=sum(significant.elim))
kable(
MF.all[MF.all$Significant > MF.all$Expected,],
caption = "Most over-represented terms of the Molecular Function ontology.")
| GO.ID | Term | Annotated | Significant | Expected | Rank in elim | elim | weight01 | lea |
|---|---|---|---|---|---|---|---|---|
| GO:0005515 | protein binding | 2101 | 925 | 794.52 | 1 | 9.6e-14 | 1.5e-15 | 7.1e-16 |
| GO:0005509 | calcium ion binding | 361 | 197 | 136.52 | 2 | 5.0e-12 | 5.0e-12 | 5.0e-12 |
| GO:0005525 | GTP binding | 198 | 95 | 74.88 | 3 | 4.9e-05 | 4.9e-05 | 4.9e-05 |
| GO:0004672 | protein kinase activity | 274 | 127 | 103.62 | 5 | 6.5e-05 | 0.00011 | 6.5e-05 |
| GO:0003774 | motor activity | 131 | 69 | 49.54 | 10 | 0.00049 | 0.00013 | 0.00049 |
| GO:0022857 | transmembrane transporter activity | 583 | 282 | 220.47 | 55 | 0.01674 | 0.00032 | 2.5e-09 |
| GO:0005524 | ATP binding | 786 | 334 | 297.24 | 6 | 0.00035 | 0.00035 | 0.00035 |
| GO:0005507 | copper ion binding | 18 | 13 | 6.81 | 8 | 0.00042 | 0.00042 | 0.00042 |
| GO:0005200 | structural constituent of cytoskeleton | 17 | 11 | 6.43 | 9 | 0.00049 | 0.00049 | 0.00049 |
| GO:0004181 | metallocarboxypeptidase activity | 19 | 14 | 7.19 | 11 | 0.00061 | 0.00061 | 0.00061 |
| GO:0016491 | oxidoreductase activity | 417 | 188 | 157.69 | 50 | 0.01405 | 0.00071 | 0.00343 |
| GO:0004252 | serine-type endopeptidase activity | 88 | 43 | 33.28 | 12 | 0.00072 | 0.00072 | 0.00072 |
| GO:0061630 | ubiquitin protein ligase activity | 27 | 17 | 10.21 | 13 | 0.00113 | 0.00113 | 0.00113 |
| GO:0005230 | extracellular ligand-gated ion channel a… | 54 | 28 | 20.42 | 51 | 0.01507 | 0.00127 | 0.01507 |
| GO:0008138 | protein tyrosine/serine/threonine phosph… | 23 | 16 | 8.70 | 17 | 0.00181 | 0.00181 | 0.00181 |
| GO:0004198 | calcium-dependent cysteine-type endopept… | 28 | 16 | 10.59 | 21 | 0.00257 | 0.00257 | 0.00257 |
| GO:0003924 | GTPase activity | 138 | 67 | 52.19 | 22 | 0.00262 | 0.00262 | 0.00262 |
| GO:0004601 | peroxidase activity | 34 | 19 | 12.86 | 40 | 0.01130 | 0.00274 | 0.01130 |
| GO:0042626 | ATPase activity, coupled to transmembran… | 86 | 49 | 32.52 | 14 | 0.00116 | 0.00343 | 0.00116 |
| GO:0005337 | nucleoside transmembrane transporter act… | 7 | 6 | 2.65 | 24 | 0.00406 | 0.00406 | 0.00406 |
| GO:0016787 | hydrolase activity | 1382 | 648 | 522.62 | 20 | 0.00237 | 0.00478 | 7.8e-17 |
| GO:0008017 | microtubule binding | 62 | 36 | 23.45 | 27 | 0.00524 | 0.00524 | 0.00524 |
| GO:0016715 | oxidoreductase activity, acting on paire… | 14 | 10 | 5.29 | 28 | 0.00529 | 0.00529 | 0.00529 |
| GO:0004222 | metalloendopeptidase activity | 73 | 40 | 27.61 | 29 | 0.00570 | 0.00570 | 0.00570 |
| GO:0016840 | carbon-nitrogen lyase activity | 8 | 7 | 3.03 | 30 | 0.00572 | 0.00572 | 0.00572 |
| GO:0030248 | cellulose binding | 7 | 6 | 2.65 | 31 | 0.00647 | 0.00647 | 0.00647 |
| GO:0047631 | ADP-ribose diphosphatase activity | 6 | 5 | 2.27 | 32 | 0.00720 | 0.00720 | 0.00720 |
| GO:0005506 | iron ion binding | 72 | 39 | 27.23 | 16 | 0.00144 | 0.00740 | 0.00144 |
| GO:0005249 | voltage-gated potassium channel activity | 27 | 12 | 10.21 | 33 | 0.00748 | 0.00748 | 0.00748 |
| GO:0004930 | G protein-coupled receptor activity | 180 | 85 | 68.07 | 18 | 0.00182 | 0.00775 | 0.00182 |
| GO:0016791 | phosphatase activity | 100 | 53 | 37.82 | 66 | 0.02452 | 0.00996 | 0.00138 |
| GO:0030246 | carbohydrate binding | 51 | 31 | 19.29 | 39 | 0.01068 | 0.01022 | 0.00071 |
| GO:0020037 | heme binding | 95 | 49 | 35.93 | 42 | 0.01213 | 0.01213 | 0.01213 |
| GO:0008092 | cytoskeletal protein binding | 144 | 77 | 54.46 | 37 | 0.00965 | 0.01257 | 0.00037 |
| GO:0008271 | secondary active sulfate transmembrane t… | 13 | 8 | 4.92 | 46 | 0.01352 | 0.01352 | 0.01352 |
| GO:0043565 | sequence-specific DNA binding | 146 | 61 | 55.21 | 98 | 0.04742 | 0.01536 | 0.04742 |
| GO:0004568 | chitinase activity | 9 | 6 | 3.40 | 52 | 0.01589 | 0.01589 | 0.01589 |
kable(
MF.all[MF.all$Significant < MF.all$Expected,],
caption = "Most under-represented terms of the Molecular Function ontology.")
| GO.ID | Term | Annotated | Significant | Expected | Rank in elim | elim | weight01 | lea |
|---|
vennDiagram(vennCounts(cbind(weight01=significant.weight01,
lea=significant.lea,
elim=significant.elim)))
kable(data.frame(Term=Term(GOTERM[sigTerms]),
Definition=Definition(GOTERM[sigTerms]),
PValue=score(MF.weight01)[sigTerms]),
caption = paste('Molecular function terms significantly associated with', VAR,
'according to all 3 algorithms', sep=' '))
| Term | Definition | PValue | |
|---|---|---|---|
| GO:0005515 | protein binding | Interacting selectively and non-covalently with any protein or protein complex (a complex of two or more proteins that may include other nonprotein molecules). | 0.0000000 |
| GO:0005509 | calcium ion binding | Interacting selectively and non-covalently with calcium ions (Ca2+). | 0.0000000 |
| GO:0005525 | GTP binding | Interacting selectively and non-covalently with GTP, guanosine triphosphate. | 0.0000489 |
| GO:0004672 | protein kinase activity | Catalysis of the phosphorylation of an amino acid residue in a protein, usually according to the reaction: a protein + ATP = a phosphoprotein + ADP. | 0.0001111 |
| GO:0003774 | motor activity | Catalysis of the generation of force resulting either in movement along a microfilament or microtubule, or in torque resulting in membrane scission, coupled to the hydrolysis of a nucleoside triphosphate. | 0.0001264 |
| GO:0005524 | ATP binding | Interacting selectively and non-covalently with ATP, adenosine 5’-triphosphate, a universally important coenzyme and enzyme regulator. | 0.0003453 |
| GO:0005507 | copper ion binding | Interacting selectively and non-covalently with copper (Cu) ions. | 0.0004234 |
| GO:0005200 | structural constituent of cytoskeleton | The action of a molecule that contributes to the structural integrity of a cytoskeletal structure. | 0.0004903 |
| GO:0004181 | metallocarboxypeptidase activity | Catalysis of the hydrolysis of C-terminal amino acid residues from a polypeptide chain by a mechanism in which water acts as a nucleophile, one or two metal ions hold the water molecule in place, and charged amino acid side chains are ligands for the metal ions. | 0.0006050 |
| GO:0004252 | serine-type endopeptidase activity | Catalysis of the hydrolysis of internal, alpha-peptide bonds in a polypeptide chain by a catalytic mechanism that involves a catalytic triad consisting of a serine nucleophile that is activated by a proton relay involving an acidic residue (e.g. aspartate or glutamate) and a basic residue (usually histidine). | 0.0007237 |
| GO:0061630 | ubiquitin protein ligase activity | Catalysis of the transfer of ubiquitin to a substrate protein via the reaction X-ubiquitin + S -> X + S-ubiquitin, where X is either an E2 or E3 enzyme, the X-ubiquitin linkage is a thioester bond, and the S-ubiquitin linkage is an amide bond: an isopeptide bond between the C-terminal glycine of ubiquitin and the epsilon-amino group of lysine residues in the substrate or, in the linear extension of ubiquitin chains, a peptide bond the between the C-terminal glycine and N-terminal methionine of ubiquitin residues. | 0.0011253 |
| GO:0008138 | protein tyrosine/serine/threonine phosphatase activity | Catalysis of the reactions: protein serine + H2O = protein serine + phosphate; protein threonine phosphate + H2O = protein threonine + phosphate; and protein tyrosine phosphate + H2O = protein tyrosine + phosphate. | 0.0018139 |
| GO:0004198 | calcium-dependent cysteine-type endopeptidase activity | Catalysis of the hydrolysis of nonterminal peptide bonds in a polypeptide chain by a mechanism using a cysteine residue at the enzyme active center, and requiring the presence of calcium. | 0.0025653 |
| GO:0003924 | GTPase activity | Catalysis of the reaction: GTP + H2O = GDP + phosphate. | 0.0026185 |
| GO:0042626 | ATPase activity, coupled to transmembrane movement of substances | Catalysis of the reaction: ATP + H2O = ADP + phosphate, to directly drive the active transport of a substance across a membrane. | 0.0034322 |
| GO:0005337 | nucleoside transmembrane transporter activity | Enables the transfer of a nucleoside, a nucleobase linked to either beta-D-ribofuranose (ribonucleoside) or 2-deoxy-beta-D-ribofuranose, (a deoxyribonucleotide) from one side of a membrane to the other. | 0.0040616 |
| GO:0016787 | hydrolase activity | Catalysis of the hydrolysis of various bonds, e.g. C-O, C-N, C-C, phosphoric anhydride bonds, etc. Hydrolase is the systematic name for any enzyme of EC class 3. | 0.0047809 |
| GO:0008017 | microtubule binding | Interacting selectively and non-covalently with microtubules, filaments composed of tubulin monomers. | 0.0052445 |
| GO:0016715 | oxidoreductase activity, acting on paired donors, with incorporation or reduction of molecular oxygen, reduced ascorbate as one donor, and incorporation of one atom of oxygen | Catalysis of an oxidation-reduction (redox) reaction in which hydrogen or electrons are transferred from reduced ascorbate and one other donor, and one atom of oxygen is incorporated into one donor. | 0.0052901 |
| GO:0004222 | metalloendopeptidase activity | Catalysis of the hydrolysis of internal, alpha-peptide bonds in a polypeptide chain by a mechanism in which water acts as a nucleophile, one or two metal ions hold the water molecule in place, and charged amino acid side chains are ligands for the metal ions. | 0.0056977 |
| GO:0016840 | carbon-nitrogen lyase activity | Catalysis of the release of ammonia or one of its derivatives, with the formation of a double bond or ring. Enzymes with this activity may catalyze the actual elimination of the ammonia, amine or amide, e.g. CH-CH(-NH-R) = C=CH- + NH2-R. Others, however, catalyze elimination of another component, e.g. water, which is followed by spontaneous reactions that lead to breakage of the C-N bond, e.g. L-serine ammonia-lyase (EC:4.3.1.17), so that the overall reaction is C(-OH)-CH(-NH2) = CH2-CO- + NH3, i.e. an elimination with rearrangement. The sub-subclasses of EC:4.3 are the ammonia-lyases (EC:4.3.1), lyases acting on amides, amidines, etc. (EC:4.3.2), the amine-lyases (EC:4.3.3), and other carbon-nitrogen lyases (EC:4.3.99). | 0.0057185 |
| GO:0030248 | cellulose binding | Interacting selectively and non-covalently with cellulose. | 0.0064685 |
| GO:0047631 | ADP-ribose diphosphatase activity | Catalysis of the reaction: ADP-ribose + H2O = AMP + D-ribose 5-phosphate. | 0.0071963 |
| GO:0005506 | iron ion binding | Interacting selectively and non-covalently with iron (Fe) ions. | 0.0074013 |
| GO:0005249 | voltage-gated potassium channel activity | Enables the transmembrane transfer of a potassium ion by a voltage-gated channel. A voltage-gated channel is a channel whose open state is dependent on the voltage across the membrane in which it is embedded. | 0.0074770 |
| GO:0004930 | G protein-coupled receptor activity | Combining with an extracellular signal and transmitting the signal across the membrane by activating an associated G-protein; promotes the exchange of GDP for GTP on the alpha subunit of a heterotrimeric G-protein complex. | 0.0077529 |
showSigOfNodes(GOdata.MF, score(MF.weight01),
firstSigNodes = sum(significant.elim),
wantedNodes = sigTerms)
## $dag
## A graphNEL graph with directed edges
## Number of Nodes = 142
## Number of Edges = 191
##
## $complete.dag
## [1] "A graph with 142 nodes."
showGroupDensity(GOdata.MF, orderedTerms[1], rm.one=FALSE)
showGroupDensity(GOdata.MF, orderedTerms[2], rm.one=FALSE)
showGroupDensity(GOdata.MF, orderedTerms[3], rm.one=FALSE)
orderedTerms <- names(sort(score(CC.weight01)))
significant.weight01 <- score(CC.weight01)[orderedTerms] <= 0.01
significant.lea <- score(CC.lea)[orderedTerms] <= 0.01
significant.elim <- score(CC.elim)[orderedTerms] <= 0.01
sigTerms <- orderedTerms[significant.weight01 & significant.lea & significant.elim]
# sigTerms gets updated, and is already used in the code, but I need a permanent copy for later.
CC.pvalue.sigTerms <- sigTerms
CC.all <- GenTable(GOdata.CC, elim=CC.elim, weight01=CC.weight01, lea=CC.lea,
orderBy="weight01", ranksOf="elim", topNodes=sum(significant.elim))
kable(
CC.all[CC.all$Significant > CC.all$Expected,],
caption = "Most over-represented terms of the Cellular Component ontology.")
| GO.ID | Term | Annotated | Significant | Expected | Rank in elim | elim | weight01 | lea |
|---|---|---|---|---|---|---|---|---|
| GO:0016021 | integral component of membrane | 885 | 425 | 316.52 | 1 | 4.6e-16 | 1.4e-14 | 4.8e-22 |
| GO:0016020 | membrane | 1565 | 707 | 559.72 | 3 | 1.5e-05 | 1.1e-05 | 5.5e-26 |
| GO:0005887 | integral component of plasma membrane | 118 | 61 | 42.20 | 2 | 1.3e-05 | 4.1e-05 | 1.3e-07 |
| GO:0016459 | myosin complex | 33 | 23 | 11.80 | 4 | 4.9e-05 | 4.9e-05 | 4.9e-05 |
| GO:0098800 | inner mitochondrial membrane protein com… | 21 | 14 | 7.51 | 10 | 0.00191 | 0.00018 | 0.00191 |
| GO:0008076 | voltage-gated potassium channel complex | 13 | 8 | 4.65 | 6 | 0.00072 | 0.00072 | 0.00072 |
| GO:0090575 | RNA polymerase II transcription factor c… | 23 | 17 | 8.23 | 8 | 0.00087 | 0.00101 | 0.00087 |
| GO:0005576 | extracellular region | 140 | 65 | 50.07 | 11 | 0.00209 | 0.00286 | 0.00209 |
| GO:0005886 | plasma membrane | 170 | 84 | 60.80 | 5 | 0.00068 | 0.00416 | 1.1e-10 |
| GO:0042555 | MCM complex | 10 | 6 | 3.58 | 17 | 0.00893 | 0.00893 | 0.00893 |
| GO:0005634 | nucleus | 531 | 215 | 189.91 | 14 | 0.00673 | 0.00900 | 0.02945 |
| GO:0005743 | mitochondrial inner membrane | 33 | 22 | 11.80 | 21 | 0.01125 | 0.01125 | 0.00023 |
| GO:0098796 | membrane protein complex | 145 | 69 | 51.86 | 37 | 0.02782 | 0.01309 | 0.00525 |
| GO:0005929 | cilium | 27 | 18 | 9.66 | 20 | 0.01078 | 0.01334 | 0.00176 |
| GO:0044430 | cytoskeletal part | 152 | 73 | 54.36 | 45 | 0.03229 | 0.01571 | 0.01508 |
| GO:0034704 | calcium channel complex | 5 | 4 | 1.79 | 23 | 0.01677 | 0.01677 | 0.01677 |
| GO:0016591 | RNA polymerase II, holoenzyme | 20 | 14 | 7.15 | 9 | 0.00161 | 0.01836 | 0.00161 |
| GO:0005681 | spliceosomal complex | 12 | 6 | 4.29 | 24 | 0.01853 | 0.01853 | 0.01853 |
| GO:0031225 | anchored component of membrane | 6 | 4 | 2.15 | 30 | 0.02104 | 0.02104 | 0.02104 |
kable(
CC.all[CC.all$Significant < CC.all$Expected,],
caption = "Most under-represented terms of the Cellular Component ontology.")
| GO.ID | Term | Annotated | Significant | Expected | Rank in elim | elim | weight01 | lea |
|---|
vennDiagram(vennCounts(cbind(weight01=significant.weight01,
lea=significant.lea,
elim=significant.elim)))
kable(data.frame(Term=Term(GOTERM[sigTerms]),
Definition=Definition(GOTERM[sigTerms]),
PValue=score(CC.weight01)[sigTerms]),
caption = paste('Cellular component terms significantly associated with', VAR,
'according to all 3 algorithms', sep=' '))
| Term | Definition | PValue | |
|---|---|---|---|
| GO:0016021 | integral component of membrane | The component of a membrane consisting of the gene products and protein complexes having at least some part of their peptide sequence embedded in the hydrophobic region of the membrane. | 0.0000000 |
| GO:0016020 | membrane | A lipid bilayer along with all the proteins and protein complexes embedded in it an attached to it. | 0.0000106 |
| GO:0005887 | integral component of plasma membrane | The component of the plasma membrane consisting of the gene products and protein complexes having at least some part of their peptide sequence embedded in the hydrophobic region of the membrane. | 0.0000405 |
| GO:0016459 | myosin complex | A protein complex, formed of one or more myosin heavy chains plus associated light chains and other proteins, that functions as a molecular motor; uses the energy of ATP hydrolysis to move actin filaments or to move vesicles or other cargo on fixed actin filaments; has magnesium-ATPase activity and binds actin. Myosin classes are distinguished based on sequence features of the motor, or head, domain, but also have distinct tail regions that are believed to bind specific cargoes. | 0.0000485 |
| GO:0098800 | inner mitochondrial membrane protein complex | Any protein complex that is part of the inner mitochondrial membrane. | 0.0001819 |
| GO:0008076 | voltage-gated potassium channel complex | A protein complex that forms a transmembrane channel through which potassium ions may cross a cell membrane in response to changes in membrane potential. | 0.0007178 |
| GO:0090575 | RNA polymerase II transcription factor complex | A transcription factor complex that acts at a regulatory region of a gene transcribed by RNA polymerase II. | 0.0010086 |
| GO:0005576 | extracellular region | The space external to the outermost structure of a cell. For cells without external protective or external encapsulating structures this refers to space outside of the plasma membrane. This term covers the host cell environment outside an intracellular parasite. | 0.0028572 |
| GO:0005886 | plasma membrane | The membrane surrounding a cell that separates the cell from its external environment. It consists of a phospholipid bilayer and associated proteins. | 0.0041590 |
| GO:0042555 | MCM complex | A hexameric protein complex required for the initiation and regulation of DNA replication. | 0.0089259 |
showSigOfNodes(GOdata.CC, score(CC.weight01),
firstSigNodes = sum(significant.elim),
wantedNodes = sigTerms)
## $dag
## A graphNEL graph with directed edges
## Number of Nodes = 76
## Number of Edges = 133
##
## $complete.dag
## [1] "A graph with 76 nodes."
showGroupDensity(GOdata.CC, orderedTerms[1], rm.one=FALSE)
showGroupDensity(GOdata.CC, orderedTerms[2], rm.one=FALSE)
showGroupDensity(GOdata.CC, orderedTerms[3], rm.one=FALSE)
I need to generate the topGO objects again, using the alternative gene ordering, based on the proportion of expression-level variance explained by regime. I miss a way to inform the topGOdata object that the score in this case is reversed, with respect to p-values: the higher it is, the more differentially expressed the gene is. To make sure that GO terms are tested in the same way than when using p-values, I will just reverse the proportion of variance explained by regime to its complement. Taking this into account, the subset of interesting genes (selection() function) must be defined as the lowest 10% scores, which are the 10% genes with largest portion of variance explained by regime.
selection <- function(allScores) {return(allScores <= quantile(allScores, probs = 0.10))}
GOdataVar.BP <- new('topGOdata',
description = 'Differential expression among Brachionus plicatilis populations.',
ontology = 'BP',
allGenes = 1.0 - Variance,
annot = annFUN.gene2GO,
gene2GO = allgenes2GO,
geneSelectionFun = selection,
nodeSize = 5)
GOdataVar.MF <- new('topGOdata',
description = 'Differential expression among Brachionus plicatilis populations.',
ontology = 'MF',
allGenes = 1.0 - Variance,
annot = annFUN.gene2GO,
gene2GO = allgenes2GO,
geneSelectionFun = selection,
nodeSize = 5)
GOdataVar.CC <- new('topGOdata',
description = 'Differential expression among Brachionus plicatilis populations.',
ontology = 'CC',
allGenes = 1.0 - Variance,
annot = annFUN.gene2GO,
gene2GO = allgenes2GO,
geneSelectionFun = selection,
nodeSize = 5)
library(knitr)
DataSummary <- data.frame(ontology = c('BP', 'MF', 'CC'),
Num_Genes = sapply(list(GOdataVar.BP, GOdataVar.MF, GOdataVar.CC), numGenes),
Num_GO_terms = sapply(list(GOdataVar.BP, GOdataVar.MF, GOdataVar.CC), function(x) length(usedGO(x))))
kable(DataSummary, caption='Number of feasible genes or transcripts and number of GO terms used in each data set.')
rm(DataSummary)
| ontology | Num_Genes | Num_GO_terms |
|---|---|---|
| BP | 15162 | 1154 |
| MF | 17429 | 576 |
| CC | 12940 | 291 |
BPvar.elim <- runTest(GOdataVar.BP, algorithm = "elim", statistic = "ks")
BPvar.weight01 <- runTest(GOdataVar.BP, algorithm = "weight01", statistic = "ks")
BPvar.lea <- runTest(GOdataVar.BP, algorithm = "lea", statistic = "ks")
MFvar.elim <- runTest(GOdataVar.MF, algorithm = "elim", statistic = "ks")
MFvar.weight01 <- runTest(GOdataVar.MF, algorithm = "weight01", statistic = "ks")
MFvar.lea <- runTest(GOdataVar.MF, algorithm = "lea", statistic = "ks")
CCvar.elim <- runTest(GOdataVar.CC, algorithm = "elim", statistic = "ks")
CCvar.weight01 <- runTest(GOdataVar.CC, algorithm = "weight01", statistic = "ks")
CCvar.lea <- runTest(GOdataVar.CC, algorithm = "lea", statistic = "ks")
ResultsSummary <- data.frame(ontology = rep(c("BP", "MF", "CC"), each = 3),
algorithm = rep(c("elim", "weight01", "lea"), 3),
TermsTested = sapply(list(BPvar.elim, BPvar.weight01, BPvar.lea, MFvar.elim, MFvar.weight01, MFvar.lea, CCvar.elim, CCvar.weight01, CCvar.lea), function(X) length(score(X))),
Significant = sapply(list(BPvar.elim, BPvar.weight01, BPvar.lea, MFvar.elim, MFvar.weight01, MFvar.lea, CCvar.elim, CCvar.weight01, CCvar.lea), function(X) sum(score(X) < 0.01)))
kable(ResultsSummary, caption="Number of non-trivial terms tested and those with a score (not corrected p-value) lower than 0.01.")
| ontology | algorithm | TermsTested | Significant |
|---|---|---|---|
| BP | elim | 1154 | 37 |
| BP | weight01 | 1154 | 29 |
| BP | lea | 1154 | 69 |
| MF | elim | 576 | 41 |
| MF | weight01 | 576 | 28 |
| MF | lea | 576 | 75 |
| CC | elim | 291 | 17 |
| CC | weight01 | 291 | 13 |
| CC | lea | 291 | 35 |
rm(ResultsSummary)
orderedTerms <- names(sort(score(BPvar.weight01)))
significant.weight01 <- score(BPvar.weight01)[orderedTerms] <= 0.01
significant.lea <- score(BPvar.lea)[orderedTerms] <= 0.01
significant.elim <- score(BPvar.elim)[orderedTerms] <= 0.01
sigTerms <- orderedTerms[significant.weight01 & significant.lea & significant.elim]
# sigTerms gets updated, and is already used in the code, but I need a permanent copy for later.
BP.variance.sigTerms <- sigTerms
BPvar.all <- GenTable(GOdataVar.BP, elim=BPvar.elim, weight01=BPvar.weight01, lea=BPvar.lea,
orderBy="weight01", ranksOf="elim", topNodes=sum(significant.elim))
kable(
BPvar.all[BPvar.all$Significant > BPvar.all$Expected,],
caption = "Most over-represented terms of the Biological Process ontology.")
| GO.ID | Term | Annotated | Significant | Expected | Rank in elim | elim | weight01 | lea | |
|---|---|---|---|---|---|---|---|---|---|
| 1 | GO:0055085 | transmembrane transport | 500 | 83 | 48.38 | 1 | 1.8e-12 | 1.7e-11 | 1.4e-14 |
| 2 | GO:0006508 | proteolysis | 442 | 78 | 42.77 | 2 | 1.5e-09 | 9.2e-10 | 1.6e-12 |
| 3 | GO:0055114 | oxidation-reduction process | 376 | 45 | 36.38 | 4 | 1.4e-05 | 3.8e-06 | 1.4e-05 |
| 4 | GO:0007186 | G protein-coupled receptor signaling pat… | 206 | 27 | 19.93 | 3 | 8.1e-06 | 4.9e-06 | 8.1e-06 |
| 5 | GO:0006468 | protein phosphorylation | 276 | 38 | 26.70 | 6 | 3.8e-05 | 6.5e-05 | 3.8e-05 |
| 7 | GO:0006355 | regulation of transcription, DNA-templat… | 355 | 36 | 34.35 | 7 | 0.00011 | 0.00024 | 0.00011 |
| 8 | GO:0006289 | nucleotide-excision repair | 13 | 3 | 1.26 | 10 | 0.00034 | 0.00034 | 0.00034 |
| 9 | GO:0007165 | signal transduction | 544 | 57 | 52.63 | 201 | 0.09401 | 0.00076 | 0.09401 |
| 11 | GO:0006470 | protein dephosphorylation | 58 | 7 | 5.61 | 9 | 0.00022 | 0.00095 | 0.00022 |
| 12 | GO:0048870 | cell motility | 14 | 5 | 1.35 | 65 | 0.01456 | 0.00192 | 0.01456 |
| 13 | GO:0006812 | cation transport | 272 | 32 | 26.32 | 17 | 0.00395 | 0.00193 | 1.1e-05 |
| 14 | GO:0003341 | cilium movement | 10 | 4 | 0.97 | 12 | 0.00196 | 0.00196 | 0.00196 |
| 15 | GO:0060271 | cilium assembly | 46 | 6 | 4.45 | 8 | 0.00018 | 0.00208 | 0.00018 |
| 17 | GO:0005992 | trehalose biosynthetic process | 6 | 4 | 0.58 | 14 | 0.00329 | 0.00329 | 0.00329 |
| 18 | GO:0007018 | microtubule-based movement | 115 | 13 | 11.13 | 11 | 0.00176 | 0.00333 | 6.4e-05 |
| 19 | GO:0006367 | transcription initiation from RNA polyme… | 13 | 5 | 1.26 | 16 | 0.00363 | 0.00363 | 0.00363 |
| 20 | GO:0046039 | GTP metabolic process | 5 | 2 | 0.48 | 19 | 0.00431 | 0.00431 | 0.00431 |
| 21 | GO:0046129 | purine ribonucleoside biosynthetic proce… | 5 | 1 | 0.48 | 20 | 0.00431 | 0.00431 | 0.00431 |
| 22 | GO:0007160 | cell-matrix adhesion | 16 | 6 | 1.55 | 24 | 0.00545 | 0.00545 | 0.00545 |
| 23 | GO:0070836 | caveola assembly | 7 | 1 | 0.68 | 27 | 0.00635 | 0.00635 | 0.00635 |
| 24 | GO:0006821 | chloride transport | 8 | 1 | 0.77 | 28 | 0.00671 | 0.00671 | 0.00671 |
| 25 | GO:0006979 | response to oxidative stress | 26 | 8 | 2.52 | 30 | 0.00720 | 0.00720 | 0.00720 |
| 26 | GO:0043161 | proteasome-mediated ubiquitin-dependent … | 45 | 6 | 4.35 | 31 | 0.00757 | 0.00757 | 0.00757 |
| 27 | GO:0051260 | protein homooligomerization | 25 | 3 | 2.42 | 33 | 0.00834 | 0.00834 | 0.00834 |
| 28 | GO:0008272 | sulfate transport | 13 | 5 | 1.26 | 34 | 0.00873 | 0.00873 | 0.00873 |
| 29 | GO:0042398 | cellular modified amino acid biosyntheti… | 17 | 3 | 1.64 | 575 | 0.43428 | 0.00906 | 0.43428 |
| 30 | GO:0006520 | cellular amino acid metabolic process | 117 | 13 | 11.32 | 831 | 0.69747 | 0.01044 | 0.69747 |
| 31 | GO:0006744 | ubiquinone biosynthetic process | 5 | 1 | 0.48 | 42 | 0.01115 | 0.01115 | 0.01115 |
| 32 | GO:0016236 | macroautophagy | 6 | 2 | 0.58 | 47 | 0.01116 | 0.01116 | 0.01116 |
| 33 | GO:0006241 | CTP biosynthetic process | 5 | 1 | 0.48 | 52 | 0.01249 | 0.01249 | 0.01249 |
| 34 | GO:0005991 | trehalose metabolic process | 14 | 6 | 1.35 | 57 | 0.01256 | 0.01256 | 9.1e-05 |
| 35 | GO:0006165 | nucleoside diphosphate phosphorylation | 16 | 3 | 1.55 | 272 | 0.15804 | 0.01282 | 0.15804 |
| 36 | GO:0009206 | purine ribonucleoside triphosphate biosy… | 27 | 3 | 2.61 | 380 | 0.25487 | 0.01283 | 0.25487 |
kable(
BPvar.all[BPvar.all$Significant < BPvar.all$Expected,],
caption = "Most under-represented terms of the Biological Process ontology.")
| GO.ID | Term | Annotated | Significant | Expected | Rank in elim | elim | weight01 | lea | |
|---|---|---|---|---|---|---|---|---|---|
| 6 | GO:0006813 | potassium ion transport | 56 | 4 | 5.42 | 5 | 1.8e-05 | 6.8e-05 | 1.8e-05 |
| 10 | GO:0034220 | ion transmembrane transport | 147 | 13 | 14.22 | 13 | 0.00274 | 0.00088 | 0.00274 |
| 16 | GO:0044271 | cellular nitrogen compound biosynthetic … | 783 | 66 | 75.76 | 902 | 0.78861 | 0.00311 | 0.84915 |
| 37 | GO:1901566 | organonitrogen compound biosynthetic pro… | 432 | 28 | 41.80 | 548 | 0.41388 | 0.01340 | 0.59848 |
vennDiagram(vennCounts(cbind(weight01=significant.weight01,
lea=significant.lea,
elim=significant.elim)))
kable(data.frame(Term=Term(GOTERM[sigTerms]),
Definition=Definition(GOTERM[sigTerms]),
PValue=score(BPvar.weight01)[sigTerms]),
caption = paste('Biological process terms significantly associated with', VAR,
'according to all 3 algorithms', sep=' '))
| Term | Definition | PValue | |
|---|---|---|---|
| GO:0055085 | transmembrane transport | The process in which a solute is transported across a lipid bilayer, from one side of a membrane to the other. | 0.0000000 |
| GO:0006508 | proteolysis | The hydrolysis of proteins into smaller polypeptides and/or amino acids by cleavage of their peptide bonds. | 0.0000000 |
| GO:0055114 | oxidation-reduction process | A metabolic process that results in the removal or addition of one or more electrons to or from a substance, with or without the concomitant removal or addition of a proton or protons. | 0.0000038 |
| GO:0007186 | G protein-coupled receptor signaling pathway | A series of molecular signals that proceeds with an activated receptor promoting the exchange of GDP for GTP on the alpha-subunit of an associated heterotrimeric G-protein complex. The GTP-bound activated alpha-G-protein then dissociates from the beta- and gamma-subunits to further transmit the signal within the cell. The pathway begins with receptor-ligand interaction, or for basal GPCR signaling the pathway begins with the receptor activating its G protein in the absence of an agonist, and ends with regulation of a downstream cellular process, e.g. transcription. The pathway can start from the plasma membrane, Golgi or nuclear membrane (PMID:24568158 and PMID:16902576). | 0.0000049 |
| GO:0006468 | protein phosphorylation | The process of introducing a phosphate group on to a protein. | 0.0000652 |
| GO:0006813 | potassium ion transport | The directed movement of potassium ions (K+) into, out of or within a cell, or between cells, by means of some agent such as a transporter or pore. | 0.0000680 |
| GO:0006355 | regulation of transcription, DNA-templated | Any process that modulates the frequency, rate or extent of cellular DNA-templated transcription. | 0.0002382 |
| GO:0006289 | nucleotide-excision repair | A DNA repair process in which a small region of the strand surrounding the damage is removed from the DNA helix as an oligonucleotide. The small gap left in the DNA helix is filled in by the sequential action of DNA polymerase and DNA ligase. Nucleotide excision repair recognizes a wide range of substrates, including damage caused by UV irradiation (pyrimidine dimers and 6-4 photoproducts) and chemicals (intrastrand cross-links and bulky adducts). | 0.0003450 |
| GO:0034220 | ion transmembrane transport | A process in which an ion is transported across a membrane. | 0.0008806 |
| GO:0006470 | protein dephosphorylation | The process of removing one or more phosphoric residues from a protein. | 0.0009474 |
| GO:0006812 | cation transport | The directed movement of cations, atoms or small molecules with a net positive charge, into, out of or within a cell, or between cells, by means of some agent such as a transporter or pore. | 0.0019254 |
| GO:0003341 | cilium movement | The directed, self-propelled movement of a cilium. | 0.0019603 |
| GO:0060271 | cilium assembly | The assembly of a cilium, a specialized eukaryotic organelle that consists of a filiform extrusion of the cell surface. Each cilium is bounded by an extrusion of the cytoplasmic membrane, and contains a regular longitudinal array of microtubules, anchored basally in a centriole. | 0.0020769 |
| GO:0005992 | trehalose biosynthetic process | The chemical reactions and pathways resulting in the formation of trehalose, a disaccharide isomeric with sucrose and obtained from certain lichens and fungi. | 0.0032902 |
| GO:0007018 | microtubule-based movement | A microtubule-based process that results in the movement of organelles, other microtubules, or other cellular components. Examples include motor-driven movement along microtubules and movement driven by polymerization or depolymerization of microtubules. | 0.0033296 |
| GO:0006367 | transcription initiation from RNA polymerase II promoter | Any process involved in the assembly of the RNA polymerase II preinitiation complex (PIC) at an RNA polymerase II promoter region of a DNA template, resulting in the subsequent synthesis of RNA from that promoter. The initiation phase includes PIC assembly and the formation of the first few bonds in the RNA chain, including abortive initiation, which occurs when the first few nucleotides are repeatedly synthesized and then released. Promoter clearance, or release, is the transition between the initiation and elongation phases of transcription. | 0.0036348 |
| GO:0046039 | GTP metabolic process | The chemical reactions and pathways involving GTP, guanosine triphosphate. | 0.0043146 |
| GO:0046129 | purine ribonucleoside biosynthetic process | The chemical reactions and pathways resulting in the formation of any purine ribonucleoside, a nucleoside in which purine base is linked to a ribose (beta-D-ribofuranose) molecule. | 0.0043146 |
| GO:0007160 | cell-matrix adhesion | The binding of a cell to the extracellular matrix via adhesion molecules. | 0.0054525 |
| GO:0070836 | caveola assembly | The aggregation, arrangement and bonding together of a set of components to form a caveola. A caveola is a plasma membrane raft that forms a small pit, depression, or invagination that communicates with the outside of a cell and extends inward, indenting the cytoplasm and the cell membrane. | 0.0063465 |
| GO:0006821 | chloride transport | The directed movement of chloride into, out of or within a cell, or between cells, by means of some agent such as a transporter or pore. | 0.0067108 |
| GO:0006979 | response to oxidative stress | Any process that results in a change in state or activity of a cell or an organism (in terms of movement, secretion, enzyme production, gene expression, etc.) as a result of oxidative stress, a state often resulting from exposure to high levels of reactive oxygen species, e.g. superoxide anions, hydrogen peroxide (H2O2), and hydroxyl radicals. | 0.0072014 |
| GO:0043161 | proteasome-mediated ubiquitin-dependent protein catabolic process | The chemical reactions and pathways resulting in the breakdown of a protein or peptide by hydrolysis of its peptide bonds, initiated by the covalent attachment of ubiquitin, and mediated by the proteasome. | 0.0075703 |
| GO:0051260 | protein homooligomerization | The process of creating protein oligomers, compounds composed of a small number, usually between three and ten, of identical component monomers. Oligomers may be formed by the polymerization of a number of monomers or the depolymerization of a large protein polymer. | 0.0083350 |
| GO:0008272 | sulfate transport | The directed movement of sulfate into, out of or within a cell, or between cells, by means of some agent such as a transporter or pore. | 0.0087330 |
showSigOfNodes(GOdataVar.BP, score(BPvar.weight01),
firstSigNodes = sum(significant.elim),
wantedNodes = sigTerms)
## $dag
## A graphNEL graph with directed edges
## Number of Nodes = 233
## Number of Edges = 447
##
## $complete.dag
## [1] "A graph with 233 nodes."
Below I plot variance portion, but for the termo found most significant when using p-values, for comparison.
showGroupDensity(GOdataVar.BP, orderedTerms[1], rm.one=FALSE)
showGroupDensity(GOdataVar.BP, orderedTerms[2], rm.one=FALSE)
showGroupDensity(GOdataVar.BP, orderedTerms[3], rm.one=FALSE)
orderedTerms <- names(sort(score(MFvar.weight01)))
significant.weight01 <- score(MFvar.weight01)[orderedTerms] <= 0.01
significant.lea <- score(MFvar.lea)[orderedTerms] <= 0.01
significant.elim <- score(MFvar.elim)[orderedTerms] <= 0.01
sigTerms <- orderedTerms[significant.weight01 & significant.lea & significant.elim]
# sigTerms gets updated, and is already used in the code, but I need a permanent copy for later.
MF.variance.sigTerms <- sigTerms
MFvar.all <- GenTable(GOdataVar.MF, elim=MFvar.elim, weight01=MFvar.weight01, lea=MFvar.lea,
orderBy="weight01", ranksOf="elim", topNodes=sum(significant.elim))
kable(
MFvar.all[MFvar.all$Significant > MFvar.all$Expected,],
caption = "Most over-represented terms of the Molecular Function ontology.")
| GO.ID | Term | Annotated | Significant | Expected | Rank in elim | elim | weight01 | lea | |
|---|---|---|---|---|---|---|---|---|---|
| 1 | GO:0005515 | protein binding | 2101 | 233 | 209.27 | 1 | 5.3e-14 | 4.7e-16 | 1.1e-14 |
| 2 | GO:0005509 | calcium ion binding | 361 | 67 | 35.96 | 2 | 9.2e-13 | 9.2e-13 | 9.2e-13 |
| 3 | GO:0003774 | motor activity | 131 | 23 | 13.05 | 4 | 1.2e-05 | 1.2e-05 | 6.3e-06 |
| 4 | GO:0004930 | G protein-coupled receptor activity | 180 | 22 | 17.93 | 3 | 4.5e-06 | 6.3e-05 | 4.5e-06 |
| 5 | GO:0004252 | serine-type endopeptidase activity | 88 | 21 | 8.77 | 5 | 6.8e-05 | 6.8e-05 | 6.8e-05 |
| 6 | GO:0005524 | ATP binding | 786 | 83 | 78.29 | 6 | 0.00020 | 0.00020 | 0.00020 |
| 7 | GO:0004672 | protein kinase activity | 274 | 37 | 27.29 | 16 | 0.00210 | 0.00042 | 0.00210 |
| 8 | GO:0022857 | transmembrane transporter activity | 583 | 86 | 58.07 | 13 | 0.00108 | 0.00045 | 8.4e-11 |
| 9 | GO:0042626 | ATPase activity, coupled to transmembran… | 86 | 12 | 8.57 | 11 | 0.00079 | 0.00058 | 0.00079 |
| 10 | GO:0004198 | calcium-dependent cysteine-type endopept… | 28 | 5 | 2.79 | 8 | 0.00059 | 0.00059 | 0.00059 |
| 11 | GO:0005525 | GTP binding | 198 | 22 | 19.72 | 9 | 0.00070 | 0.00070 | 0.00070 |
| 12 | GO:0061630 | ubiquitin protein ligase activity | 27 | 4 | 2.69 | 12 | 0.00104 | 0.00104 | 0.00104 |
| 13 | GO:0004181 | metallocarboxypeptidase activity | 19 | 5 | 1.89 | 14 | 0.00147 | 0.00147 | 0.00147 |
| 14 | GO:0016787 | hydrolase activity | 1382 | 218 | 137.65 | 269 | 0.26288 | 0.00207 | 5.8e-18 |
| 15 | GO:0005230 | extracellular ligand-gated ion channel a… | 54 | 9 | 5.38 | 20 | 0.00253 | 0.00231 | 0.00253 |
| 17 | GO:0005507 | copper ion binding | 18 | 6 | 1.79 | 21 | 0.00259 | 0.00259 | 0.00259 |
| 18 | GO:0051015 | actin filament binding | 30 | 9 | 2.99 | 27 | 0.00350 | 0.00350 | 0.00350 |
| 19 | GO:0016840 | carbon-nitrogen lyase activity | 8 | 2 | 0.80 | 28 | 0.00381 | 0.00381 | 0.00381 |
| 20 | GO:0004550 | nucleoside diphosphate kinase activity | 5 | 1 | 0.50 | 31 | 0.00482 | 0.00482 | 0.00482 |
| 21 | GO:0004601 | peroxidase activity | 34 | 8 | 3.39 | 10 | 0.00078 | 0.00491 | 0.00078 |
| 22 | GO:0030246 | carbohydrate binding | 51 | 14 | 5.08 | 22 | 0.00259 | 0.00546 | 0.00259 |
| 23 | GO:0005200 | structural constituent of cytoskeleton | 17 | 7 | 1.69 | 33 | 0.00599 | 0.00599 | 0.00599 |
| 24 | GO:0016491 | oxidoreductase activity | 417 | 49 | 41.53 | 26 | 0.00340 | 0.00661 | 0.00340 |
| 26 | GO:0016772 | transferase activity, transferring phosp… | 459 | 54 | 45.72 | 315 | 0.35545 | 0.00758 | 0.59000 |
| 27 | GO:0003777 | microtubule motor activity | 98 | 10 | 9.76 | 37 | 0.00833 | 0.00833 | 0.00833 |
| 28 | GO:0047631 | ADP-ribose diphosphatase activity | 6 | 2 | 0.60 | 38 | 0.00847 | 0.00847 | 0.00847 |
| 29 | GO:0008271 | secondary active sulfate transmembrane t… | 13 | 5 | 1.29 | 43 | 0.01015 | 0.01015 | 0.01015 |
| 30 | GO:0020037 | heme binding | 95 | 14 | 9.46 | 45 | 0.01045 | 0.01045 | 0.01045 |
| 31 | GO:0008092 | cytoskeletal protein binding | 144 | 21 | 14.34 | 85 | 0.02962 | 0.01068 | 0.00102 |
| 32 | GO:0005506 | iron ion binding | 72 | 13 | 7.17 | 18 | 0.00228 | 0.01292 | 0.00228 |
| 34 | GO:0004983 | neuropeptide Y receptor activity | 9 | 3 | 0.90 | 54 | 0.01360 | 0.01360 | 0.01360 |
| 35 | GO:0070006 | metalloaminopeptidase activity | 5 | 2 | 0.50 | 56 | 0.01367 | 0.01367 | 0.01367 |
| 36 | GO:0008138 | protein tyrosine/serine/threonine phosph… | 23 | 4 | 2.29 | 57 | 0.01394 | 0.01394 | 0.01394 |
| 37 | GO:0004866 | endopeptidase inhibitor activity | 17 | 3 | 1.69 | 25 | 0.00332 | 0.01490 | 0.00332 |
| 38 | GO:0004555 | alpha,alpha-trehalase activity | 8 | 2 | 0.80 | 58 | 0.01496 | 0.01496 | 0.01496 |
| 39 | GO:0008324 | cation transmembrane transporter activit… | 284 | 32 | 28.29 | 66 | 0.01772 | 0.01601 | 0.00015 |
| 40 | GO:0005452 | inorganic anion exchanger activity | 6 | 2 | 0.60 | 65 | 0.01711 | 0.01711 | 0.01711 |
| 41 | GO:0008137 | NADH dehydrogenase (ubiquinone) activity | 12 | 2 | 1.20 | 67 | 0.01904 | 0.01904 | 0.01904 |
kable(
MFvar.all[MFvar.all$Significant < MFvar.all$Expected,],
caption = "Most under-represented terms of the Molecular Function ontology.")
| GO.ID | Term | Annotated | Significant | Expected | Rank in elim | elim | weight01 | lea | |
|---|---|---|---|---|---|---|---|---|---|
| 16 | GO:0005249 | voltage-gated potassium channel activity | 27 | 1 | 2.69 | 19 | 0.00246 | 0.00246 | 0.00246 |
| 25 | GO:0043565 | sequence-specific DNA binding | 146 | 9 | 14.54 | 87 | 0.03060 | 0.00745 | 0.03060 |
| 33 | GO:0008017 | microtubule binding | 62 | 5 | 6.18 | 53 | 0.01355 | 0.01355 | 0.01355 |
vennDiagram(vennCounts(cbind(weight01=significant.weight01,
lea=significant.lea,
elim=significant.elim)))
kable(data.frame(Term=Term(GOTERM[sigTerms]),
Definition=Definition(GOTERM[sigTerms]),
PValue=score(MFvar.weight01)[sigTerms]),
caption = paste('Molecular functions terms significantly associated with', VAR,
'according to all 3 algorithms', sep=' '))
| Term | Definition | PValue | |
|---|---|---|---|
| GO:0005515 | protein binding | Interacting selectively and non-covalently with any protein or protein complex (a complex of two or more proteins that may include other nonprotein molecules). | 0.0000000 |
| GO:0005509 | calcium ion binding | Interacting selectively and non-covalently with calcium ions (Ca2+). | 0.0000000 |
| GO:0003774 | motor activity | Catalysis of the generation of force resulting either in movement along a microfilament or microtubule, or in torque resulting in membrane scission, coupled to the hydrolysis of a nucleoside triphosphate. | 0.0000119 |
| GO:0004930 | G protein-coupled receptor activity | Combining with an extracellular signal and transmitting the signal across the membrane by activating an associated G-protein; promotes the exchange of GDP for GTP on the alpha subunit of a heterotrimeric G-protein complex. | 0.0000627 |
| GO:0004252 | serine-type endopeptidase activity | Catalysis of the hydrolysis of internal, alpha-peptide bonds in a polypeptide chain by a catalytic mechanism that involves a catalytic triad consisting of a serine nucleophile that is activated by a proton relay involving an acidic residue (e.g. aspartate or glutamate) and a basic residue (usually histidine). | 0.0000681 |
| GO:0005524 | ATP binding | Interacting selectively and non-covalently with ATP, adenosine 5’-triphosphate, a universally important coenzyme and enzyme regulator. | 0.0001974 |
| GO:0004672 | protein kinase activity | Catalysis of the phosphorylation of an amino acid residue in a protein, usually according to the reaction: a protein + ATP = a phosphoprotein + ADP. | 0.0004160 |
| GO:0022857 | transmembrane transporter activity | Enables the transfer of a substance, usually a specific substance or a group of related substances, from one side of a membrane to the other. | 0.0004494 |
| GO:0042626 | ATPase activity, coupled to transmembrane movement of substances | Catalysis of the reaction: ATP + H2O = ADP + phosphate, to directly drive the active transport of a substance across a membrane. | 0.0005768 |
| GO:0004198 | calcium-dependent cysteine-type endopeptidase activity | Catalysis of the hydrolysis of nonterminal peptide bonds in a polypeptide chain by a mechanism using a cysteine residue at the enzyme active center, and requiring the presence of calcium. | 0.0005902 |
| GO:0005525 | GTP binding | Interacting selectively and non-covalently with GTP, guanosine triphosphate. | 0.0006978 |
| GO:0061630 | ubiquitin protein ligase activity | Catalysis of the transfer of ubiquitin to a substrate protein via the reaction X-ubiquitin + S -> X + S-ubiquitin, where X is either an E2 or E3 enzyme, the X-ubiquitin linkage is a thioester bond, and the S-ubiquitin linkage is an amide bond: an isopeptide bond between the C-terminal glycine of ubiquitin and the epsilon-amino group of lysine residues in the substrate or, in the linear extension of ubiquitin chains, a peptide bond the between the C-terminal glycine and N-terminal methionine of ubiquitin residues. | 0.0010360 |
| GO:0004181 | metallocarboxypeptidase activity | Catalysis of the hydrolysis of C-terminal amino acid residues from a polypeptide chain by a mechanism in which water acts as a nucleophile, one or two metal ions hold the water molecule in place, and charged amino acid side chains are ligands for the metal ions. | 0.0014696 |
| GO:0005230 | extracellular ligand-gated ion channel activity | Enables the transmembrane transfer of an ion by a channel that opens when a specific extracellular ligand has been bound by the channel complex or one of its constituent parts. | 0.0023052 |
| GO:0005249 | voltage-gated potassium channel activity | Enables the transmembrane transfer of a potassium ion by a voltage-gated channel. A voltage-gated channel is a channel whose open state is dependent on the voltage across the membrane in which it is embedded. | 0.0024622 |
| GO:0005507 | copper ion binding | Interacting selectively and non-covalently with copper (Cu) ions. | 0.0025881 |
| GO:0051015 | actin filament binding | Interacting selectively and non-covalently with an actin filament, also known as F-actin, a helical filamentous polymer of globular G-actin subunits. | 0.0034984 |
| GO:0016840 | carbon-nitrogen lyase activity | Catalysis of the release of ammonia or one of its derivatives, with the formation of a double bond or ring. Enzymes with this activity may catalyze the actual elimination of the ammonia, amine or amide, e.g. CH-CH(-NH-R) = C=CH- + NH2-R. Others, however, catalyze elimination of another component, e.g. water, which is followed by spontaneous reactions that lead to breakage of the C-N bond, e.g. L-serine ammonia-lyase (EC:4.3.1.17), so that the overall reaction is C(-OH)-CH(-NH2) = CH2-CO- + NH3, i.e. an elimination with rearrangement. The sub-subclasses of EC:4.3 are the ammonia-lyases (EC:4.3.1), lyases acting on amides, amidines, etc. (EC:4.3.2), the amine-lyases (EC:4.3.3), and other carbon-nitrogen lyases (EC:4.3.99). | 0.0038068 |
| GO:0004550 | nucleoside diphosphate kinase activity | Catalysis of the reaction: ATP + nucleoside diphosphate = ADP + nucleoside triphosphate. | 0.0048219 |
| GO:0004601 | peroxidase activity | Catalysis of the reaction: donor + hydrogen peroxide = oxidized donor + 2 H2O. | 0.0049118 |
| GO:0030246 | carbohydrate binding | Interacting selectively and non-covalently with any carbohydrate, which includes monosaccharides, oligosaccharides and polysaccharides as well as substances derived from monosaccharides by reduction of the carbonyl group (alditols), by oxidation of one or more hydroxy groups to afford the corresponding aldehydes, ketones, or carboxylic acids, or by replacement of one or more hydroxy group(s) by a hydrogen atom. Cyclitols are generally not regarded as carbohydrates. | 0.0054564 |
| GO:0005200 | structural constituent of cytoskeleton | The action of a molecule that contributes to the structural integrity of a cytoskeletal structure. | 0.0059853 |
| GO:0016491 | oxidoreductase activity | Catalysis of an oxidation-reduction (redox) reaction, a reversible chemical reaction in which the oxidation state of an atom or atoms within a molecule is altered. One substrate acts as a hydrogen or electron donor and becomes oxidized, while the other acts as hydrogen or electron acceptor and becomes reduced. | 0.0066083 |
| GO:0003777 | microtubule motor activity | Catalysis of movement along a microtubule, coupled to the hydrolysis of a nucleoside triphosphate (usually ATP). | 0.0083292 |
| GO:0047631 | ADP-ribose diphosphatase activity | Catalysis of the reaction: ADP-ribose + H2O = AMP + D-ribose 5-phosphate. | 0.0084694 |
showSigOfNodes(GOdataVar.MF, score(MFvar.weight01),
firstSigNodes = sum(significant.elim),
wantedNodes = sigTerms)
## $dag
## A graphNEL graph with directed edges
## Number of Nodes = 158
## Number of Edges = 211
##
## $complete.dag
## [1] "A graph with 158 nodes."
I plot variance portion, but for the term found most significant when using p-values, for comparison.
showGroupDensity(GOdataVar.MF, orderedTerms[1], rm.one=FALSE)
showGroupDensity(GOdataVar.MF, orderedTerms[2], rm.one=FALSE)
showGroupDensity(GOdataVar.MF, orderedTerms[3], rm.one=FALSE)
orderedTerms <- names(sort(score(CCvar.weight01)))
significant.weight01 <- score(CCvar.weight01)[orderedTerms] <= 0.01
significant.lea <- score(CCvar.lea)[orderedTerms] <= 0.01
significant.elim <- score(CCvar.elim)[orderedTerms] <= 0.01
sigTerms <- orderedTerms[significant.weight01 & significant.lea & significant.elim]
if (length(sigTerms) == 0) {
sigTerms <- orderedTerms[significant.lea & significant.elim]
}
# sigTerms gets updated, and is already used in the code, but I need a permanent copy for later.
CC.variance.sigTerms <- sigTerms
CCvar.all <- GenTable(GOdataVar.CC, elim=CCvar.elim, weight01=CCvar.weight01, lea=CCvar.lea,
orderBy="weight01", ranksOf="elim", topNodes=max(sum(significant.elim), 10))
kable(
CCvar.all[CCvar.all$Significant > CCvar.all$Expected,],
caption = "Most over-represented terms of the Cellular Component ontology.")
| GO.ID | Term | Annotated | Significant | Expected | Rank in elim | elim | weight01 | lea | |
|---|---|---|---|---|---|---|---|---|---|
| 1 | GO:0016021 | integral component of membrane | 885 | 126 | 80.91 | 1 | 9.2e-20 | 1.6e-18 | 1.3e-24 |
| 2 | GO:0016020 | membrane | 1565 | 199 | 143.08 | 2 | 3.0e-08 | 1.1e-07 | < 1e-30 |
| 3 | GO:0016459 | myosin complex | 33 | 13 | 3.02 | 3 | 6.0e-06 | 6.0e-06 | 6.0e-06 |
| 5 | GO:0005887 | integral component of plasma membrane | 118 | 16 | 10.79 | 5 | 0.00030 | 0.00096 | 2.9e-06 |
| 6 | GO:0005576 | extracellular region | 140 | 23 | 12.80 | 7 | 0.00092 | 0.00111 | 0.00092 |
| 7 | GO:0090575 | RNA polymerase II transcription factor c… | 23 | 5 | 2.10 | 11 | 0.00232 | 0.00148 | 0.00232 |
| 9 | GO:0005886 | plasma membrane | 170 | 20 | 15.54 | 12 | 0.00241 | 0.00259 | 2.9e-08 |
| 10 | GO:0005743 | mitochondrial inner membrane | 33 | 6 | 3.02 | 9 | 0.00111 | 0.00355 | 0.00111 |
| 11 | GO:0005929 | cilium | 27 | 4 | 2.47 | 6 | 0.00091 | 0.00366 | 0.00014 |
| 13 | GO:0098800 | inner mitochondrial membrane protein com… | 21 | 2 | 1.92 | 23 | 0.01870 | 0.00960 | 0.01870 |
| 14 | GO:0016591 | RNA polymerase II, holoenzyme | 20 | 3 | 1.83 | 13 | 0.00262 | 0.01302 | 0.00262 |
| 16 | GO:0030176 | integral component of endoplasmic reticu… | 15 | 2 | 1.37 | 42 | 0.04594 | 0.01619 | 0.04594 |
| 17 | GO:0005764 | lysosome | 8 | 2 | 0.73 | 19 | 0.01668 | 0.01668 | 0.01668 |
kable(
CCvar.all[CCvar.all$Significant < CCvar.all$Expected,],
caption = "Most under-represented terms of the Cellular Component ontology.")
| GO.ID | Term | Annotated | Significant | Expected | Rank in elim | elim | weight01 | lea | |
|---|---|---|---|---|---|---|---|---|---|
| 4 | GO:0008076 | voltage-gated potassium channel complex | 13 | 1 | 1.19 | 4 | 7.1e-05 | 7.1e-05 | 7.1e-05 |
| 8 | GO:0005634 | nucleus | 531 | 43 | 48.55 | 10 | 0.00137 | 0.00255 | 0.00011 |
| 12 | GO:0005741 | mitochondrial outer membrane | 16 | 0 | 1.46 | 17 | 0.00922 | 0.00922 | 0.00922 |
| 15 | GO:0098797 | plasma membrane protein complex | 34 | 2 | 3.11 | 30 | 0.03400 | 0.01547 | 5.0e-05 |
vennDiagram(vennCounts(cbind(weight01=significant.weight01,
lea=significant.lea,
elim=significant.elim)))
kable(data.frame(Term=Term(GOTERM[sigTerms]),
Definition=Definition(GOTERM[sigTerms]),
PValue=score(CCvar.weight01)[sigTerms]),
caption = paste('Cellular component terms significantly associated with', VAR,
'according to all 3 algorithms', sep=' '))
| Term | Definition | PValue | |
|---|---|---|---|
| GO:0016021 | integral component of membrane | The component of a membrane consisting of the gene products and protein complexes having at least some part of their peptide sequence embedded in the hydrophobic region of the membrane. | 0.0000000 |
| GO:0016020 | membrane | A lipid bilayer along with all the proteins and protein complexes embedded in it an attached to it. | 0.0000001 |
| GO:0016459 | myosin complex | A protein complex, formed of one or more myosin heavy chains plus associated light chains and other proteins, that functions as a molecular motor; uses the energy of ATP hydrolysis to move actin filaments or to move vesicles or other cargo on fixed actin filaments; has magnesium-ATPase activity and binds actin. Myosin classes are distinguished based on sequence features of the motor, or head, domain, but also have distinct tail regions that are believed to bind specific cargoes. | 0.0000060 |
| GO:0008076 | voltage-gated potassium channel complex | A protein complex that forms a transmembrane channel through which potassium ions may cross a cell membrane in response to changes in membrane potential. | 0.0000713 |
| GO:0005887 | integral component of plasma membrane | The component of the plasma membrane consisting of the gene products and protein complexes having at least some part of their peptide sequence embedded in the hydrophobic region of the membrane. | 0.0009560 |
| GO:0005576 | extracellular region | The space external to the outermost structure of a cell. For cells without external protective or external encapsulating structures this refers to space outside of the plasma membrane. This term covers the host cell environment outside an intracellular parasite. | 0.0011084 |
| GO:0090575 | RNA polymerase II transcription factor complex | A transcription factor complex that acts at a regulatory region of a gene transcribed by RNA polymerase II. | 0.0014827 |
| GO:0005634 | nucleus | A membrane-bounded organelle of eukaryotic cells in which chromosomes are housed and replicated. In most cells, the nucleus contains all of the cell’s chromosomes except the organellar chromosomes, and is the site of RNA synthesis and processing. In some species, or in specialized cell types, RNA metabolism or DNA replication may be absent. | 0.0025467 |
| GO:0005886 | plasma membrane | The membrane surrounding a cell that separates the cell from its external environment. It consists of a phospholipid bilayer and associated proteins. | 0.0025944 |
| GO:0005743 | mitochondrial inner membrane | The inner, i.e. lumen-facing, lipid bilayer of the mitochondrial envelope. It is highly folded to form cristae. | 0.0035516 |
| GO:0005929 | cilium | A specialized eukaryotic organelle that consists of a filiform extrusion of the cell surface and of some cytoplasmic parts. Each cilium is largely bounded by an extrusion of the cytoplasmic (plasma) membrane, and contains a regular longitudinal array of microtubules, anchored to a basal body. | 0.0036627 |
| GO:0005741 | mitochondrial outer membrane | The outer, i.e. cytoplasm-facing, lipid bilayer of the mitochondrial envelope. | 0.0092228 |
showSigOfNodes(GOdataVar.CC, score(CCvar.weight01),
firstSigNodes = sum(significant.weight01),
wantedNodes = sigTerms)
## $dag
## A graphNEL graph with directed edges
## Number of Nodes = 63
## Number of Edges = 115
##
## $complete.dag
## [1] "A graph with 63 nodes."
For comparison, I plot the distribution of variance portion for the CC term found most significant when using p-values.
showGroupDensity(GOdataVar.CC, orderedTerms[1], rm.one=FALSE)
showGroupDensity(GOdataVar.CC, orderedTerms[2], rm.one=FALSE)
showGroupDensity(GOdataVar.CC, orderedTerms[3], rm.one=FALSE)
allTerms <- usedGO(GOdata.BP)
vennDiagram(vennCounts(cbind(PValue = allTerms %in% BP.pvalue.sigTerms,
Variance = allTerms %in% BP.variance.sigTerms)))
ggplot(data = data.frame(PValue = rank(score(BP.weight01))[allTerms],
Variance = rank(score(BPvar.weight01))[allTerms]),
mapping = aes(x = PValue, y = Variance)) +
geom_point() + geom_smooth(method='lm') + xlab('Using gene p-values') +
ylab('Using portion of explained variance') +
ggtitle('Ordering of BP terms by significance')
allTerms <- usedGO(GOdata.MF)
vennDiagram(vennCounts(cbind(PValue = allTerms %in% MF.pvalue.sigTerms,
Variance = allTerms %in% MF.variance.sigTerms)))
ggplot(data = data.frame(PValue = rank(score(MF.weight01))[allTerms],
Variance = rank(score(MFvar.weight01))[allTerms]),
mapping = aes(x = PValue, y = Variance)) +
geom_point() + geom_smooth(methdo='lm') + xlab('Using gene p-values') +
ylab('Using portion of explained variance') +
ggtitle('Ordering of MF terms by significance')
## Warning: Ignoring unknown parameters: methdo
allTerms <- usedGO(GOdata.CC)
vennDiagram(vennCounts(cbind(PValue = allTerms %in% CC.pvalue.sigTerms,
Variance = allTerms %in% CC.variance.sigTerms)))
ggplot(data = data.frame(PValue = rank(score(CC.weight01))[allTerms],
Variance = rank(score(CCvar.weight01))[allTerms]),
mapping = aes(x = PValue, y = Variance)) +
geom_point() + geom_smooth(method='lm') + xlab('Using gene p-values') +
ylab('Using portion of explained variance') +
ggtitle('Ordering of CC terms by significance')
I save the main variables to be able to load them in a new R session later.
save(allgenes2GO,
GOdata.BP, BP.elim, BP.weight01, BP.lea, BP.pvalue.sigTerms,
GOdata.MF, MF.elim, MF.weight01, MF.lea, MF.pvalue.sigTerms,
GOdata.CC, CC.elim, CC.weight01, CC.lea, CC.pvalue.sigTerms,
GOdataVar.BP, BPvar.elim, BPvar.weight01, BPvar.lea, BP.variance.sigTerms,
GOdataVar.MF, MFvar.elim, MFvar.weight01, MFvar.lea, MF.variance.sigTerms,
GOdataVar.CC, CCvar.elim, CCvar.weight01, CCvar.lea, CC.variance.sigTerms,
file = paste('Enrichment', TAG, VAR, 'RData', sep='.'))
sessionInfo()
## R version 3.6.3 (2020-02-29)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 18.04.4 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1
##
## locale:
## [1] LC_CTYPE=ca_ES.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=es_ES.UTF-8 LC_COLLATE=ca_ES.UTF-8
## [5] LC_MONETARY=es_ES.UTF-8 LC_MESSAGES=ca_ES.UTF-8
## [7] LC_PAPER=es_ES.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=es_ES.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] grid stats4 parallel stats graphics grDevices utils
## [8] datasets methods base
##
## other attached packages:
## [1] Rgraphviz_2.30.0 ggplot2_3.3.0 limma_3.42.2
## [4] knitr_1.28 topGO_2.38.1 SparseM_1.78
## [7] GO.db_3.10.0 AnnotationDbi_1.48.0 IRanges_2.20.2
## [10] S4Vectors_0.24.4 Biobase_2.46.0 graph_1.64.0
## [13] BiocGenerics_0.32.0
##
## loaded via a namespace (and not attached):
## [1] tidyselect_1.0.0 xfun_0.13 purrr_0.3.4 splines_3.6.3
## [5] lattice_0.20-41 colorspace_1.4-1 vctrs_0.2.4 htmltools_0.4.0
## [9] yaml_2.2.1 mgcv_1.8-31 blob_1.2.1 rlang_0.4.6
## [13] pillar_1.4.3 glue_1.4.0 withr_2.2.0 DBI_1.1.0
## [17] bit64_0.9-7 matrixStats_0.56.0 lifecycle_0.2.0 stringr_1.4.0
## [21] munsell_0.5.0 gtable_0.3.0 memoise_1.1.0 evaluate_0.14
## [25] labeling_0.3 highr_0.8 Rcpp_1.0.4.6 scales_1.1.0
## [29] farver_2.0.3 bit_1.1-15.2 digest_0.6.25 stringi_1.4.6
## [33] dplyr_0.8.5 tools_3.6.3 magrittr_1.5 RSQLite_2.2.0
## [37] tibble_3.0.1 crayon_1.3.4 pkgconfig_2.0.3 ellipsis_0.3.0
## [41] Matrix_1.2-18 assertthat_0.2.1 rmarkdown_2.1 R6_2.4.1
## [45] nlme_3.1-147 compiler_3.6.3